Factor oracle, Suffix oracle (Extended Abstract)

نویسندگان

  • Cyril Allauzen
  • Maxime Crochemore
چکیده

We introduce a new automaton on a word p, sequence of letters taken in an alphabet , that we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m + 1 states and a linear number of transitions. We give an on-line construction algorithm of the factor oracle. The tight links between this structure and the suux automaton allows us to introduce a second structure, the suux oracle. We use these two structures in string matching algorithms that we conjecture optimal according to the experimental results. These algorithms are as eecient as the ones that already exist using less memory and being more easy to implement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Properties of Factor Oracles

Factor and suffix oracles have been introduced in [1] in order to provide an economic and efficient solution for storing all the factors and suffixes respectively of a given text. Whereas good estimations exist for the size of the factor/suffix oracle in the worst case, no average-case analysis has been done until now. In this paper, we give an estimation of the average size for the factor/suff...

متن کامل

A detail analysis on factor oracle construction of computing repeated factors

We show a detail implementation for a linear time and space method, introduced in [3], to compute the length of a repeated suffix for each prefix of a given word p. This method is based on the utilization of the factor oracle [1] of p, which is deterministic acyclic automata accepting all subustrings of p. keyword: factor oracle, suffix link, repetition

متن کامل

Combinatorial Characterization of the Language Recognized by Factor and Suffix Oracles

Sequence Analysis requires to elaborate data structures which allow both an efficient storage and use. Among these, we can cite Tries [1], Suffix Automata [1, 2], Suffix Trees [1, 3]. Cyril Allauzen, Maxime Crochemore and Mathieu Raffinot introduced [4, 5, 6] a new data structure, linear on the size of the represented word both in time and space, having the smallest number of states, and allowi...

متن کامل

Error analysis of factor oracles

Factor oracles [1] constructed from a given text are deterministic acyclic automata accepting all substrings of the text. Factor oracles are more space economical and easy to implement than similar data structures such as suffix tree[6]. There is, however, some drawback; a factor oracle may accept strings not in the text, which we call a error acceptance. In this paper, we charactrize factor or...

متن کامل

A new taxonomy of sublinear keyword pattern matching algorithms

This paper presents a new taxonomy of sublinear (multiple) keyword pattern matching algorithms. Based on an earlier taxonomy by Watson and Zwaan [WZ96, WZ95], this new taxonomy includes not only suffix-based algorithms related to the Boyer-Moore, CommentzWalter and Fan-Su algorithms, but factorand factor oracle-based algorithms such as Backward DAWG Matching and Backward Oracle Matching as well...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007